{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
      "metadata": {},
      "source": [
        "# How to pass multimodal data directly to models\n",
        "\n",
        ":::info Prerequisites\n",
        "\n",
        "This guide assumes familiarity with the following concepts:\n",
        "\n",
        "- [Chat models](/docs/concepts/chat_models)\n",
        "\n",
        ":::\n",
        "\n",
        "Here we demonstrate how to pass multimodal input directly to models. \n",
        "We currently expect all input to be passed in the same format as [OpenAI expects](https://platform.openai.com/docs/guides/vision).\n",
        "For other model providers that support multimodal input, we have added logic inside the class to convert to the expected format.\n",
        "\n",
        "In this example we will ask a model to describe an image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
      "metadata": {},
      "outputs": [],
      "source": [
        "import * as fs from \"node:fs/promises\";\n",
        "\n",
        "import { ChatAnthropic } from \"@langchain/anthropic\";\n",
        "\n",
        "const model = new ChatAnthropic({\n",
        "  model: \"claude-3-sonnet-20240229\",\n",
        "});\n",
        "\n",
        "const imageData = await fs.readFile(\"../../../../examples/hotdog.jpg\");"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4fca4da7",
      "metadata": {},
      "source": [
        "The most commonly supported way to pass in images is to pass it in as a byte string within a message with a complex content type for models that support multimodal input. Here's an example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "ec680b6b",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "This image contains a hot dog. It shows a frankfurter or sausage encased in a soft, elongated bread bun. The sausage itself appears to be reddish in color, likely a smoked or cured variety. The bun is a golden-brown color, suggesting it has been lightly toasted or grilled. The hot dog is presented against a plain white background, allowing the details of the iconic American fast food item to be clearly visible.\n"
          ]
        }
      ],
      "source": [
        "import { HumanMessage } from \"@langchain/core/messages\";\n",
        "\n",
        "const message = new HumanMessage({\n",
        "  content: [\n",
        "    {\n",
        "      type: \"text\",\n",
        "      text: \"what does this image contain?\"},\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: {\n",
        "        url: `data:image/jpeg;base64,${imageData.toString(\"base64\")}`},\n",
        "    },\n",
        "  ],\n",
        "})\n",
        "const response = await model.invoke([message]);\n",
        "console.log(response.content);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8656018e-c56d-47d2-b2be-71e87827f90a",
      "metadata": {},
      "source": [
        "Some model providers support taking an HTTP URL to the image directly in a content block of type `\"image_url\"`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The weather in the image appears to be pleasant and clear. The sky is mostly blue with a few scattered clouds, indicating good visibility and no immediate signs of rain. The lighting suggests it’s either morning or late afternoon, with sunlight creating a warm and bright atmosphere. There is no indication of strong winds, as the grass and foliage appear calm and undisturbed. Overall, it looks like a beautiful day, possibly spring or summer, ideal for outdoor activities.\n"
          ]
        }
      ],
      "source": [
        "import { ChatOpenAI } from \"@langchain/openai\";\n",
        "\n",
        "const openAIModel = new ChatOpenAI({\n",
        "  model: \"gpt-4o\",\n",
        "});\n",
        "\n",
        "const imageUrl = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\";\n",
        "\n",
        "const message = new HumanMessage({\n",
        "  content: [\n",
        "    {\n",
        "      type: \"text\",\n",
        "      text: \"describe the weather in this image\"},\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: { url: imageUrl }\n",
        "    },\n",
        "  ],\n",
        "});\n",
        "const response = await openAIModel.invoke([message]);\n",
        "console.log(response.content);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "1c470309",
      "metadata": {},
      "source": [
        "We can also pass in multiple images."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "id": "325fb4ca",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Yes, the two images are the same.\n"
          ]
        }
      ],
      "source": [
        "const message = new HumanMessage({\n",
        "  content: [\n",
        "    {\n",
        "      type: \"text\",\n",
        "      text: \"are these two images the same?\"\n",
        "    },\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: {\n",
        "        url: imageUrl\n",
        "      }\n",
        "    },\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: {\n",
        "        url: imageUrl\n",
        "      }\n",
        "    },\n",
        "  ],\n",
        "});\n",
        "const response = await openAIModel.invoke([message]);\n",
        "console.log(response.content);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "bad38378",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "You've now learned how to pass multimodal data to a modal.\n",
        "\n",
        "Next, you can check out our guide on [multimodal tool calls](/docs/how_to/tool_calls_multimodal)."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Deno",
      "language": "typescript",
      "name": "deno"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "typescript",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.1"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
